Goto

Collaborating Authors

 zero-one law



Zero-One Laws of Graph Neural Networks

Neural Information Processing Systems

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erdős-Rényi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or one. This class includes the popular graph convolutional network architecture. The result establishes `zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.


Towards a Zero-One Law for Column Subset Selection

Neural Information Processing Systems

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k~B}\|A-B\|_1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k~B}\|A-B\|_p^p$. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of $\poly(k \log n)$ columns whose cost is no more than a $\poly(k)$ factor larger than the cost of the best rank-$k$ matrix.



Reviews: Towards a Zero-One Law for Column Subset Selection

Neural Information Processing Systems

As a caveat, I am not an expert in the literature surrounding low-rank reconstruction, and may not be entirely correct in my evaluation of the originality and significance of the contributions. Originality:This paper builds upon previous work, in particular [62], which developed column-subset selection for low-rank approximation under the l_p norm. This paper expands upon [62], obtaining results for a broader class of functions and furthermore tightening and fixing some results from [62]. These expansions seem very valuable to the machine learning community. However, the authors may want to further motivate their work by providing specific examples of loss functions to which they extend previous theory, and which have found successful applications in machine learning.


Zero-One Laws of Graph Neural Networks

Neural Information Processing Systems

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erdős–Rényi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or one. This class includes the popular graph convolutional network architecture.


Towards a Zero-One Law for Column Subset Selection

Neural Information Processing Systems

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank- k matrix B minimizing the sum of absolute values of differences to a given n -by- n matrix A, \min_{\textrm{rank-}k B}\ A-B\ _1, or more generally finding a rank- k matrix B which minimizes the sum of p -th powers of absolute values of differences, \min_{\textrm{rank-}k B}\ A-B\ _p p . Many of these algorithms are linear time columns subset selection algorithms, returning a subset of \poly(k \log n) columns whose cost is no more than a \poly(k) factor larger than the cost of the best rank- k matrix. The above error measures are special cases of the following general entrywise low rank approximation problem: given an arbitrary function g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}, find a rank- k matrix B which minimizes \ A-B\ _g \sum_{i,j}g(A_{i,j}-B_{i,j}) . A natural question is which functions g admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models.


Zero-One Laws of Graph Neural Networks

Adam-Day, Sam, Iliant, Theodor Mihai, Ceylan, İsmail İlkan

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.


Towards a Zero-One Law for Column Subset Selection

Song, Zhao, Woodruff, David, Zhong, Peilin

Neural Information Processing Systems

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k B}\ A-B\ _1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k B}\ A-B\ _p p$. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of $\poly(k \log n)$ columns whose cost is no more than a $\poly(k)$ factor larger than the cost of the best rank-$k$ matrix. A natural question is which functions $g$ admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for {\it every} function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary.